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NOMBAS  -  A  Bayesian  Procedure  for  Selecting  the  Greatest  Mean 

by 
Alan  R.  Washburn 

Introduction:   Suppose  that  an  experimenter  must  choose  one  category  out  of 
k  after  making  a  limited  number  of  performance  tests.   The  experimenter's 
goal  is  to  select  the  category  with  the  greatest  mean  performance.   The  cate- 
gories could  represent  anything  from  competing  aircraft  designs  to  feed  sup- 
plements; whatever  the  interpretation,  the  statistical  problem  is  usually 
referred  to  as  being  one  of  "greatest  mean  selection".   Several  testing  pro- 
cedures are  available  in  the  literature  [2,6,8].   The  purpose  of  this  paper 
is  to  propose  a  new  one  (NOMBAS)  and  compare  it  with  certain  others. 

If  the  experimenter  were  to  test  each  category  a  fixed  number  of  times, 
he  would  typically  discover  at  the  end  of  testing  that  some  of  the  categories 
have  experimental  means  that  are  so  small  that  he  would  regret  having  tested 
them  so  much.   This  suggests  that  substantial  gains  might  be  possible  by  using 
a  sequential  procedure  wherein  the  category  to  be  tested  next  and  perhaps  even 
the  decision  to  stop  testing  depend  on  results  achieved  so  far.   This  is  what 
we  have  in  mind.   More  precisely,  NOMBAS  is  a  procedure  where  at  every  stage 
the  mean  performance  for  each  category  is  regarded  as  a  normal  random  variable, 
Initial  values  for  the  mean  and  variance  of  the  mean  performance  for  each 
category  must  be  provided  by  the  experimenter.   Whenever  testing  stops,  the 
experimenter  simply  selects  the  category  with  the  largest  current  mean.   If 
testing  is  to  be  continued,  the  experimenter  tests  the  category  for  which  the 
expected  gain  from  one  more  test  is  maximal;  this  procedure  is  "myopic" 
because  there  will  typically  be  several  tests  yet  to  be  made.   If  each  test 
involves  a  normally  distributed  experimental  error,  then  it  is  elementary  to 


apply  Bayes'  Theorem  to  obtain  "revised"  values  for  the  mean  and  variance  of 
the  tested  category,  after  which  the  procedure  is  repeated  until  finally  the 
decision  to  stop  testing  is  made.   All  this  will  be  formalized  below;  our  hope 
at  this  point  is  merely  to  have  explained  the  source  of  the  acronym  NOrmal 
Myopic  BAyes  Sequential  procedure. 

In  making  Bayesian  calculations  based  on  normal  distributions,  we  are 
following  [14].   The  pervasive  assumption  of  normality  is  perhaps  not  as  re- 
strictive as  it  might  seem  at  first  sight.   Recall  that  the  experimenter's  pur- 
pose is  to  select  the  category  with  the  greatest  mean.   If  testing  consists 
of  making  a  sequence  of  independent  observations,  then  it  is  inevitable  that 
the  choice  of  which  category  to  select  will  be  based  on  the  experimental  means 
of  the  observations  for  each  category.   By  the  Central  Limit  Theorem,  the  ex- 
perimental means  themselves,  being  sums  of  independent  random  variables,  tend 
to  be  normal  even  if  the  individual  observations  are  not.   So  there  is  reason 
to  hope  that  the  NOMBAS  procedure  may  be  robust  with  respect  to  deviations  from 
normality.   This  is  one  of  the  issues  that  will  be  explored  numerically  below, 
but  first  we  will  describe  NOMBAS  in  more  detail. 

The  NOMBAS  procedure 

Let   e .   be  the  mean  performance  of  category   i  .   For  all   i  ,  we  assume 

2  2 

that   9.   is  normal  with  mean  9.    and  variance  a.   .   Let   9..   and  a.. 
1  10  10  1J  1J 

be  the  mean  and  variance  of   0.   given  the  results  of  the  first  j   tests; 

j  •  1.   If  the  j_th  test  is  made  on  category   i  ,  we  assume  that  the  observed 

result  of  that  test  is   Z.  =  0.  +  W.  ,  where  W.   is  normal  with  mean  0 

J     i    J  J 

2 
(this  is  no  loss  of  generality)  and  known  variance   s . .  ,  and  independent  of 

9,,...,9, ,  WL,...,W.  .  .   By  using  either  Bayes'  Theorem  or  the  update  equations 

of  a  Kalman  Filter  [5],  one  can  show  that  for  the  category  tested  and  j  2.  1> 


2         2 

(2)  o..  =  p..  o  .  .  t  ,  where 

(3)  p..  =  s2./(s2.  +  a2  .  .)  . 

2     2 
For  any  category  i  not  tested  on  test  j  ,  9..  =  6.     and  a..  =  a.  «  -  . 

Furthermore,  conditional  on  the  results  of  the  first   j   tests,  all  of  the 
9.   are  normal  and  independent  of  each  other. 

If  the   jth   test  is  the  last  one,  then  NOMBAS  selects  category  *,  where 
9^.  =  max  .9 . .  .   If  exactly  one  more  test  of  category   i  ^  *  were  made,  the 

gain  from  that  test  would  be  G.,  =  max(0,  9.  ...  -  9..),  since  the  larger  of 

ij  i,j+l    *j  6 

9.  , ,,   and   9..   would  be  selected  after  the  test.   Given  the  results  of  the 

first   i   tests,   9.  . . n  -  Q. .      is  normal  with  mean   9..  -  9..   and  variance 

x,j+l  *j  ij  *j 

0        0  0  /  o  o 

(1-p.    ..t)    (o..+s.     ,11)=a../(a..   +  s.    .,J,    so   the   expected  value  of 
*i,j+l  ij  i,j+l  ij        ij  i,J+l 

G.  .      is 


(4)      g..    e  E(G..)    =   a..    F(6../o..),   where 
ij  il  iJ  iJ      ij 


(5)      °ij    =   aij//aij   +s2i,j+l    •    and 


(6)      6. .    =   9. .    -   9        ,    and 
ij  *3  ij 


(7)   F(y)  =  /  (x  -  y)dcf>(x)  (see  the  last  section) 

y 


Equations  (4)  -  (7)  also  hold  for   i  =  *,  provided   6^.   is  taken  to  be  the 
(non-negative)  difference  between  the  largest  and  second  largest  of  the 
9.  . ;  i  =  1,. . .k. 


We  now  distinguish  two  versions  of  the  NQMBAS  procedure:   NOMBASN  makes 
exactly  n  tests,  with  test  j   being  on  the  category  for  which  g    is 
largest.   NOMBASG  stops  testing  unless  g^ .  >_  g  >  0  ,  in  which  case  the  jth 
test  is  on  category   *  .   Each  procedure  has  a  parameter  associated  with  it 
that  determines  when  to  stop;   n  in  the  case  of  NOMBASN  and  g  in  the  case 
of  NOMBASG. 

Selection  of  Competing  Procedures 

Testing  procedures  for  the  greatest  mean  selection  problem  can  be  roughly 
categorized  according  to  whether  the  number  of  tests  performed  is  fixed  or 
random,  and  also  according  to  whether  the  order  of  testing  is  fixed  or  random. 
Let  us  adopt  the  notation  RF  for  procedures  where  the  number  of  tests  is 
random  but  the  order  is  (or  could  be)  fixed,  etc.   An  example  of  an   FF  pro- 
cedure is  the  procedure  of  testing  each  category  a  fixed  number  of  times  and 
then  selecting  the  category  with  the  largest  experimental  mean  [1].   Examples 
of  RF  procedures  are  those  of  Bechhofer,  Kiefer,  and  Sobel  [2],  and  also 
Blumenthal  [3].   NOMBASN  is  the  only   FR  procedure  known  to  the  author.   The 
procedures  of  Paulson  [11]  and  Stein  [13]  each  involve  the  idea  of  eliminating 
certain  categories  as  testing  proceeds;  like  NOMBASG,  they  are   RR  procedures. 
Since  the   RR  procedures  were  expected  to  dominate  the  other  classes,  all 
three  of  the  RR  procedures  were  compared.   The  other  two  (there  were  five  in 
total)  were  NOMBASN  and  the   FF  procedure  called  FIXED.   We  describe  FIXED, 
PAULSON,  and  STEIN  in  detail  below.   The  five  procedures  will  be  compared  by 
showing  how  the  Bayes  risk  depends  on  average  sample  number  for  each.   Specifi- 
cally, let   I  be  the  index  selected,  let   L  =  max.  6.  -6,  ,  and  let  N  be 

11    I 

the  number  of  tests.   Then  E(L)   is  the  Bayes  risk  and  E(N)   is  the  average 
sample  number. 


The  FIXED  procedure: 

In  this  scheme,  the  k   categories  are  tested  cyclically  in  the  order 
1,2, . . . ,k,l, . . .  .   After  a  total  of   n   tests,  the  category  with  the  greatest 
experimental  mean  is  selected,  counting  the  experimental  mean  as   6.    for  any 
untested  category.   For   n  =  km,  where  m  is  an  integer  representing  the  number 
of  times  each  category  is  tested,  a  simple  expression  for  E(L)   can  be  deter- 
mined for  the  case  where   0.   is  standard  normal  and   s..  =  s   for  all   i,  i 
as  follows:   Harter  [7]  has  tabulated   y    =    (average  of  the  largest  of   k 

independent  unit  normals),  so  u.,   is  the  best  average  gain  achievable  with 

2 
perfect  knowledge.   Since  m  observations  with  variance   s   are  equivalent  to 

2 
one  observation  with  variance   s  /m  ,  each  category  has  variance 

o 

0  0  0  0  0 

°i,km  =  (s  /m)/(s  /m  +  1)  =  s  /(s  +  m)  5  a   associated  with  it  after  km 

observations,  from  (2)  and  (3).   Since   6.   is  standard  normal  and  also  normal 

2 
with  mean   0 .  ,    and  variance   a   ,  0 .  ,    must  be  normal  with  mean   0  and 
i,km  i,km 

2 
variance   1  -  a   .   The  expected  value  of  the  largest  of  the   0.     is 

therefore   y   /l  -  o   ,  and  hence 


(8)  E(L)  =  ulk 


1  -  A  -   o'' 


For  k  =  10  and  s  =  .5  ,  this  reduces  to 

(9)  E(L)  =  1.53875  (1  -  /(m/(m  +  .25)) 

Formula  (9)  is  consistent  with  the  FIXED  curve  in  Figure  1,  with  m  =  1 
corresponding  to  E(N)  =  10,  etc.   The  FIXED  curve  was  obtained  by  simulation, 
like  all  the  others. 


The  PAULSON  procedure: 

Paulson's  [10]  procedure  irrevocably  eliminates  categories  until  only  one 

is  left,  testing  all  surviving  categories  at  each  stage.   After  r   stages, 

let  Z   be  the  average  of  the   r  measurements  that  have  been  made  on  each 
1 

category  i   that  survived  the  first   r-1  stages,  and  let   Z^  be  the  largest 

of  these.   If   Z .  <  Z u  +   A  -  a^ /r ,   category   i   is  eliminated  at  the   rth 
1     *  A  

stage.   The  maximum  number  of  stages  is  clearly  a  /A   rounded  up  to  the  next 

A 

integer,  since  by  then  all  categories  except  the  largest  have  been  eliminated. 

Paulson's  procedure  has  two  parameters  -   A  and  a  .   He  shows  in  [10]  that 

A 

2 
if  s..  =  s   for  all   i,j,   and  if  a  =  [s  /(A  -  A)]  log((k  -  l)/a)  ,  then 

ij  A 

his  procedure  will  select  the  category  with  the  largest  mean  with  probability 
at  least  1  -  a,  provided  the  largest  mean  exceeds  the  next  largest  by  at  least 
A  >  0,  for  any   A   in  the  interval  (0,  A). 

We  take  Paulson's  recommendation  [11]  and  set   A  =  (3/8) A.   The  procedure 
PAULSON  has   a  =  .1  ,  which  leaves  one  parameter  (A)  free.   E(L)  increases  with 
A   and  E(N)  decreases  with   A  ;  the  curves  labelled  PAULSON  in  Figures  1-3  were 
generated  parametrically  by  varying   A  .   Since  PAULSON  tests  each  category  at 
least  once,  E(L)  is  not  defined  for  E(N)  <  10  in  our  examples.   Limited  testing 
with  a  4    .1  did  not  reveal  a  significantly  better  value  for  a  over  the 
range  of  E(N)  considered. 

The  STEIN  procedure: 

Reference  [13]  is  reproduced  in  its  entirety  below. 

"Suppose  X..,  i  =  1,...,  p;j  =  1,2,...  are  independently  normally  distrib- 

2 
uted  with  means  E,  .   +   n  .  and  variances  a.  where  £ .  ,  n  .   are  unknown  but 
1    J  J        i    J 

2 
a.   are  known.   €,  a  are  fixed  numbers,  with  0<€,  0<a<l.   It  is 
J  ' 

desired  to  select,  by  a  sequential  procedure,  in  which  we  take  first  the 
observations  with  second  subscript  1,  etc.  an  integer   M  among  l,...,p 


such  that  for  every  k  =  1,...,  p  and  ^1>...C  ,  n-,,  n?  ,  •  .  •  satisfying 
5,  =  £.  +  €  for  all  p  ^  k,  P(M  =  k)  =£  1  -  a.   In  accordance  with  the 
following  rule,  one  decides  at  each  stage  (after  the  observations  with 
second  subscript  n)  to  take  no  more  observations  with  certain  first 
subscripts.   For  each  n  =  1,2,...  and  each  t  =   1,...,  p   compute 


n        _    €(t,  -  1)    9 


where  X.   is  the  average  of  the  observations  with  second  subscript  j 

and   t.   is  the  number  of  such  observations.   Continue  taking  observations 

X«   ,-,•••  for  those  t     for  which  this  expression  is  greater  than  (-tna)/€ 
£,n+l 

but  not  for  the  others.   Eventually  there  will  be  at  most  one  subscript 
t   =  l,...,p  for  which  one  continues  to  take  observations  and  if  there  is 
one  this  is  chosen  to  be   M.   If  there  is  none,  the  t      for  which  the  sum 
is  largest  is  chosen  to  be  M.   This  procedure  is  a  straight-forward 
application  of  the  Lemma  on  p.  146  of  Wald's  SiqadWbLaZ   Analysis  and 
generalizations  can  easily  be  found." 
In  our  case  X..  =  6.  +  W.   and   n .  =  0   for  all   i,  j  .   Stein's  procedure 

has  two  parameters  —  a  and   €  .   Our  procedure  STEIN  is  Stein's  with  a  =  .1  ; 

this  leaves   6   free  to  parametrically  generate  E(L)  vs.  E(N) .   As  in  the  case 

of  PAULSON,  limited  testing  did  not  reveal  a  significantly  better  value  for  a 

over  the  range  of  E(N)  considered. 

Results 

Figure  1  shows  E(L)  vs.  E(N)  for  the  five  competing  procedures.   In  all 

cases   k  =  10,  s..  =  .5  for  all   i,  j,  and  6.   =0  and  o   =  1   for  all   i  . 
i~j  10  io 

The  random  variables   6.   and  W.   were  generated  as  assumed  by  NOMBAS  using 
the  LLRANDOM  random  number  generator  [9].   Note  that  NOMBASN  dominates  FIXED 


and  that  NOMBASG  dominates  all  other  procedures  in  this  example.   Results  are 
based  on  5000  replications  in  all  cases;  a  68%  confidence  interval  is  shown  in 
the  shape  of  an   I   for  a  set  of  points  that  is  incomplete  but  hopefully  large 
enough  to  indicate  sampling  variability  without  cluttering  the  figure.   An 
additional  run  was  made  for  a  procedure  called  N0MBASG2  in  which  all  random 
variables  were  generated  as  above  but  a.   =2   for  all   i  .   The  curve  for 
N0MBASG2  was  indistinguishable  from  the  curve  for  NOMBASG,  indicating  that  the 
typical  robustness  of  Bayesian  procedures  with  respect  to  assumptions  about  the 
prior  holds  in  this  case. 

Figure  2  shows  the  effect  of  making  the  random  variables   6.   exponential 
with  mean  1,  while  setting   9.   =  a.   =  1  in  NOMBAS.   The  five  procedures 

°    XO     10 

dominate  each  other  in  the  same  order  as  in  Figure  1,  except  that  STEIN  is  now 

better  than  NOMBASN.   This  is  evidence  that  NOMBAS  is  robust  with  respect  to 

the  shape  as  well  as  the  scale  of  the  prior. 

Figure  3  shows  a  comparison  of  the  five  procedures  in  attempting  to  select 

the  Poisson  distribution  with  the  greatest  mean.   The  means  of  the  10  Poisson 

distributions  were  taken  to  be  exponential  with  mean  4,  while  setting 

a.   =6.   =  4  in  NOMBAS.   Since  the  variance  of  a  Poisson  random  variable  is 
10    10 

the  same  as  the  mean,  whereas  NOMBAS  assumes  the  parameter  s..  to  be  given 
independently  of  the  mean,  there  is  clearly  no  logical  way  to  determine  s.. 
in  this  case.  It  was  decided  to  set  s..  =  2  for  all  i,  j,  on  the  grounds 
that  the  means  are  all  "roughly"  4,  and  /4  =  2 .  This  thinking  is  imprecise, 
but  that  is  really  the  point:  NOMBAS  appears  to  be  robust  with  respect  to 
problems  of  this  type.  Figure  3  shows  that  the  order  of  dominance  is  as  in 
Figure  2. 

One  might  at  this  point  entertain  the  hypothesis  that  NOMBASN  and  NOMBASG 
are  actually  optimal:   NOMBASN  minimizing  average  loss  within  the  class  of 


procedures  where  the  number  of  tests  is  fixed,  and  NOMBASG  minimizing  average 
loss  within  the  class  where  the  number  of  tests  is  fixed  on  the  average.   These 
hypotheses  are  false.   The  next  section  documents  a  counterexample;  it  can 
be  skipped  without  loss  of  continuity  if  the  reader  desires. 

NOMBAS  is  not  optimal 

We  first  give  an  example  showing  that  NOMBASN  is  not  optimal  when  n  =  2. 
Suppose   k  =  3,  £o  =  (Jl,    1//2,  0),  Q_     =    (0,1,1),  and   s..  =  1   for  all   i,  j. 
The  first  category  has  a  small  mean  and  a  large  variance,  the  second  has  a 
large  mean  and  a  small  variance,  and  the  third  should  never  be  tested  because 
a„0  =  0.   Using  (4)  with   6  _  =  1,  6   =  0,  a   =  2//3,  and  a?»  =  1/^6,  we  find 
that   gin  =  .123  and  g__  =  .162,  so  category  2  should  be  tested  if  n  =  1, 
and  would  be  the  first  category  tested  by  NOMBASN  in  any  case.   Let   6_1  be 
the  mean  of   X„   given  the  results  of  this  test,  and  let   g(0?1)  be  the  dif- 
ference (average  gain  from  making  the  second  test  on  category  1)  -  (average 
gain  from  making  the  second  test  on  category  2).   Then,  since   a?1  =  1//12  , 


(10)     g(621)  =   < 


g±1  -   (i//i2)F((i  -  e21)^i2)   if   e21  <  1 


(2//3)F(621/3/2)  -  (l//l2)F((e21  -  1) /l2)    if   6^  >  1 


Since   F(«)   is  decreasing,  the  minimum  of   g(8   )   when   6   £  1  is   g(l), 
which  is  positive.   g(6   )   is  also  positive  for   6   _>  1,  since  it  is 
asymptotically  0  and  has  a  unique  critical  point  (a  maximum)  at   0_1  =  4/3. 
So  NOMBASN  will  make  the  second  test  on  category  1  regardless  of  the  outcome 
of  the  first  test  on  category  2. 

The  procedure  (call  it  P)  that  tests  the  categories  in  the  order  1,  2  is 
equivalent  to  NOMBASN,  since  the  two  procedures  do  the  same  tests.   Now  consider 
the  procedure   P'   that  first  tests  1  and  then  tests  the  category  with  the 


largest  gain.   Since  a    -    >   o„n  ,   P'   will  test  1  again  if   6  ..  =  1,  and 
will  therefore  test  1  again  with  positive  probability.   So   P'   is  strictly 
better  than  NOMBASN.   This  establishes  that  NOMBASN  is  not  optimal  in  general. 
Essentially  the  same  example  can  be  used  to  show  the  non-optimality  of  NOMBASG, 
since  NOMBASG  can  be  forced  to  make  exactly  two  tests  by  selecting  a  gain  cut- 
off  g   that  is  so  small  that  at  least  two  tests  will  be  made,  while  simulta- 
neously assuming  that  s..   is  so  large  for  j  >  2   that  at  most  two  tests  will 
be  made.   The  possibility  remains  that  NOMBASG  might  be  optimal  for  the  case 
where   s..   does  not  depend  on  j  ,  but  NOMBASG  is  not  optimal  in  general. 

Practical  Considerations 

The  fact  that  NOMBASG  dominates  all  other  procedures  in  the  sense  we  have 
described  is  not  necessarily  conclusive,  even  for  problems  that  closely  resemble 
the  example  we  have  used.   NOMBASG  is  Bayesian  and  sequential,  so  the  usual 
arguments  about  Bayesian  vs.  traditional  and  sequential  vs.  non-sequential 
decision  procedures  apply.   It  is  not  our  intention  to  resurrect  those  arguments 
here.   However,  NOMBAS  has  some  unique  difficulties  that  should  be  appreciated 
by  anyone  tempted  to  use  it. 

NOMBAS  makes  tests  one  at  a  time.   This  is  the  source  of  its  power,  but 
it  is  also  potentially  a  source  of  difficulty.   Making  tests  in  batches  may 
have  advantages  in  terms  of  speed,  cost,  or  constancy  of  experimental  conditions. 
Any  of  these  factors  could  be  decisive  in  a  given  application.   However,  we 
suggest  that  one  class  of  applications  where  these  factors  are  typically  absent 
is  in  selection  of  the  best  of  several  large  Monte-Carlo  computer  simulations; 
in  fact,  it  was  just  such  an  application  that  suggested  the  NOMBAS  procedure 
in  the  first  place.   In  that  application  ten  different  Monte  Carlo  simulations 
(actually  one  computer  program  with  ten  different  sets  of  gun  parameters)  were 
available  of  a  defensive  gun  being  attacked  by  a  large  number  of  attackers. 
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The  intention  was  to  select  the  gun  that  destroyed  the  greatest  number  of 
attackers  before  being  overwhelmed,  on  the  average.   The  process  of  writing  and 
debugging  the  program  provided  the  initial  estimates  required. 

A  critical  problem  in  the  use  of  NOMBASG  is  the  selection  of  the  parameter 
g  .   It  might  be  reasonable  to  ask  the  experimenter  to  estimate  the  amount  of 
gain   g'   in  the  selected  mean  that  would  be  just  marginally  worth  the  cost  of 
a  single  test;  i.e.,  the  absolute  slope  of  the  E(L)  vs  E(N)  curve  at  the  desired 
E(N).   Unfortunately,  there  is  usually  a  great  difference  between   g'   and   g  . 
To  obtain  the  point  where  E(N)  =  30  in  Figure  1,  for  example,  it  is  necessary 

_Q 

to  take  g  =  1.3  x  10   .   The  absolute  slope  of  the  NOMBASG  graph  of  E(L)  vs 

-4 
E(N)  at  that  point  is  g'  =  5.2  x  10   .   The  great  disparity  between  these  two 

numbers  is  connected  with  the  fact  that  the  sequence  max.  g..  is  typically 
not  monotonically  decreasing  in  j  ;  i.e.,  the  fact  that  a  large  gain  is  not 
likely  on  the  current  trial  does  not  rule  out  the  possibility  in  the  future. 
Unfortunately,  this  "explanation"  provides  no  rule  of  thumb  by  which   g  might 
be  obtained  from  g'  .   Only  a  qualitative  statement  can  be  made:   NOMBASG  is 
remarkably  reluctant  to  make  tests,  and  therefore  most  experiments  should  be 
made  with  a  remarkably  small  number  g  .   The  only  redeeming  feature  is  that 
NOMBASG  is  not  very  sensitive  to   g   anyway;  Figure  1  shows  that  changes  of 
several  orders  of  magnitude  in  g  are  required  to  increase  E(N)  from  30  to  40 
or  decrease  E(N)  from  30  to  20. 

In  many  cases,  the  experimenter  may  have  a  rough  idea  of  how  many  tests 
should  be  performed,  as  well  as  some  possibly  conflicting  feelings  about  ac- 
ceptable terminal  states.   For  such  an  experimenter  we  suggest  the  following 
NOMBAS  procedure,  which  capitalizes  on  the  fact  that  NOMBASN  and  NOMBASG  make 
tests  in  the  same  order,  and  that  the  Bayes'  calculations  (1)  -  (3)  are  valid 
even  if  the  tests  are  not  performed  in  NOMBAS  order. 

1.   Make  the  required  estimates  of  6 .   ,  o.   ,  and  s..  ;  i  =  l,...,k  , 
M  10     10         1J 

j  _>  1.   Typically,  s..  will  not  depend  on   j  . 
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2.  Perform  a  small  number  j   of  tests.   These  tests  could  be  made  in 
NOMBAS  order,  or,  in  case  the  idea  of  being  "fair"  to  all  categories 
is  important,  they  could  be  spread  evenly  over  the  categories.   Use 
equations  (1)  -  (3)  for  each  test  and  also  (4)  -  (7)  if  NOMBAS  order 
is  used.   Calculate  8..  ,  a..  ,  and  g . .  ;  i  =  l,...,k. 

3.  Examine  the  calculations  to  determine  whether  testing  should  be  con- 
tinued.  The  runner s-up  to  the  largest  of  the  g. .  should  not  be  ignored 
(as  NOMBAS  does) ;  the  presence  of  close  runners-up  is  a  motive  for 
continuation.   The  fact  that  6..  and  a.,  have  well  defined  meanings 
should  be  an  aid  in  making  the  decision.   If  no  further  testing  is 
appropriate,  select  the  largest  of  the  6..  .   Otherwise,  return  to 
step  2. 

The  above  procedure  is  intended  to  be  a  compromise  between  NOMBASN  and  NOMBASG, 

and  is  probably  somewhere  between  them  in  effectiveness. 

The  fact  that  NOMBAS  is  a  Bayesian  procedure  has  some  practical  advantages. 
Suppose  that  category   *  were  revealed  to  be  best  after  a  limited  amount  of 
testing.   This  might  cause  a  closer  examination  of  category   *  ,  and  it  might  turn 
out  that  category  *  was  tested  incorrectly  —  an  error  in  coding  might  be 
the  reason  if   *  were  a  computer  simulation.   If  the  other  categories  were 
not  in  error,  then  the  experiment  could  be  continued  by  correcting  the  error 
in  *  ,  resetting   6, .  and  o . .  to  0 ,   and  a,   ,  and  then  continuing  to  make 

*-}  *J  XQ  XQ 

tests  in  NOMBAS  order.   The  testing  already  done  on  non-  *  categories  would 
not  have  to  be  wasted  by  starting  the  whole  experiment  over,  and  the  experiment 
could  be  continued  using  the  originally  intended  logic. 

Finally,  and  to  the  extent  that  general  conclusions  are  justified  by 
experiments  such  as  those  we  have  described: 

1.   If  the  number  of  tests  must  be  fixed,  then  NOMBASN  is  substantially 
better  than  FIXED. 
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2.   If  a  sequential  experiment  is  acceptable,  and  if  NOMBAS  is  rejected 
on  account  of  its  Bayesian  origins,  then  PAULSON  is  better  than 
STEIN. 

The  function  F(y) 

It  is  not  difficult  to  show  that  the  function  F(y)  defined  in  (A)  can  be 
expressed  as 

(11)  F(y)  =  /  (x  -  y)  d$(x)  =  <Ky)  -  y(l  "  *>(y))  , 

y 

since  the  right  and  left-hand  sides  are  both  asymptotically   0   and  have  the 
same  derivative  with  respect  to   y.   Since  the  cumulative  normal  function  $(y) 
is  widely  tabulated,  this  provides  a  ready  means  of  evaluation.   However,  for 
large  y  the  right-hand  side  of  (11)  is  the  difference  of  two  small  and  very 
nearly  equal  quantities,  which  is  numerically  unfortunate.   To  get  around  this 
difficulty,  write  (11)  as 

(12)  F(y)  -  <Ky)d  -  yR(y))  , 

where  R(y)  =  (1  -  $(y))/<Ky)   is  Mill's  ratio.   Mill's  ratio  satisfies  the 
following  inequality  [12]: 


(13)      2/(y  +  /y2   +  2bQ)  <  R(y)  <    2/(y  +  /y2  +   2b J    , 


where     b      =   4/tt   and  b     =2.      Let 
o  °° 


(14)  b(y)    =    (8/tt  +  2.36y  +  y2)/(2  +   .5(2.36y  +  y2)) 


1  3 


Then  b(o)  =  b    and  b(°°)  =  b    regardless  of  the  parameter  that  is  2.36 
o  °° 

in  (14),  which  means  that  the  function 


(15)  R(y)  =  2/(y  +  /y2  +   2b  (y)) 

is  a  good  approximation  to  R(y)   for  large  and  small  y.   The  parameter  that 
is  2.36  was  selected  to  give  a  good  fit  over  the  midrange,  and  the  function 

(16)  F(y)  E  <Ky)(l  -  y  R(y)) 

was  used  as  an  approximation  to   F(y)   in  all  computations  reported  here. 
Some  algebra  shows  that 


(17)      F(y)  =  2c|)  (y)  b(y)/(y  +  /y2  +  2b(y))2 


which  eliminates  the  need  to  take  the  difference  of  two  small  and  nearly  equal 
quantities.   The  difference   |F(y)  -  F(y)|/F(y)   never  exceeds  .003  .   Given 
the  apparent  robustness  of  NOMBAS ,  it  is  likely  that  simpler  approximations 
to  F(y)  than  (17)  would  be  adequate. 
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Figure    1: 


Selecting  the  largest  of  ten  normally  distributed  means  of 
normal  random  variables. 
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Figure    2 


Selecting  the  largest  of  ten  exponentially  distributed  means 
of  normal  random  variables. 
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Figure    3 


Selecting  the  largest  of  ten  exponentially  distributed  means 
of  Poisson  random  variables. 
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